178 research outputs found

    Learning Forward Models for Robots

    No full text

    Exploiting Scene-specific Features for Object Goal Navigation

    Full text link
    Can the intrinsic relation between an object and the room in which it is usually located help agents in the Visual Navigation Task? We study this question in the context of Object Navigation, a problem in which an agent has to reach an object of a specific class while moving in a complex domestic environment. In this paper, we introduce a new reduced dataset that speeds up the training of navigation models, a notoriously complex task. Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times even without the use of huge computational resources. Therefore, this reduced dataset guarantees a significant benchmark and it can be used to identify promising models that could be then tried on bigger and more challenging datasets. Subsequently, we propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them, highlighting quantitatively how the idea is correct.Comment: Accepted at ACVR2020 ECCV2020 Worksho

    Regression with Linear Factored Functions

    Full text link
    Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201

    Hidden Markov Models and their Application for Predicting Failure Events

    Full text link
    We show how Markov mixed membership models (MMMM) can be used to predict the degradation of assets. We model the degradation path of individual assets, to predict overall failure rates. Instead of a separate distribution for each hidden state, we use hierarchical mixtures of distributions in the exponential family. In our approach the observation distribution of the states is a finite mixture distribution of a small set of (simpler) distributions shared across all states. Using tied-mixture observation distributions offers several advantages. The mixtures act as a regularization for typically very sparse problems, and they reduce the computational effort for the learning algorithm since there are fewer distributions to be found. Using shared mixtures enables sharing of statistical strength between the Markov states and thus transfer learning. We determine for individual assets the trade-off between the risk of failure and extended operating hours by combining a MMMM with a partially observable Markov decision process (POMDP) to dynamically optimize the policy for when and how to maintain the asset.Comment: Will be published in the proceedings of ICCS 2020; @Booklet{EasyChair:3183, author = {Paul Hofmann and Zaid Tashman}, title = {Hidden Markov Models and their Application for Predicting Failure Events}, howpublished = {EasyChair Preprint no. 3183}, year = {EasyChair, 2020}

    Self-Modification of Policy and Utility Function in Rational Agents

    Full text link
    Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

    The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

    Full text link
    We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub- optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we ex- tend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR

    Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

    Get PDF
    International audienceRecently many works have proposed to cast human-machine interaction in a sentence generation scheme. Neural networks models can learn how to generate a probable sentence based on the user's statement along with a partial view of the dialogue history. While appealing to some extent, these approaches require huge training sets of general-purpose data and lack a principled way to intertwine language generation with information retrieval from back-end resources to fuel the dialogue with actualised and precise knowledge. As a practical alternative, in this paper, we present Lilia, a showcase for fast bootstrap of conversation-like dialogues based on a goal-oriented system. First, a comparison of goal-oriented and conversational system features is led, then a conversion process is described for the fast bootstrap of a new system, finalised with an on-line training of the system's main components. Lilia is dedicated to a chitchat task, where speakers exchange viewpoints on a displayed image while trying collaboratively to derive its author's intention. Evaluations with user trials showed its efficiency in a realistic setup

    LNCS

    Get PDF
    A controller is a device that interacts with a plant. At each time point,it reads the plant’s state and issues commands with the goal that the plant oper-ates optimally. Constructing optimal controllers is a fundamental and challengingproblem. Machine learning techniques have recently been successfully applied totrain controllers, yet they have limitations. Learned controllers are monolithic andhard to reason about. In particular, it is difficult to add features without retraining,to guarantee any level of performance, and to achieve acceptable performancewhen encountering untrained scenarios. These limitations can be addressed bydeploying quantitative run-timeshieldsthat serve as a proxy for the controller.At each time point, the shield reads the command issued by the controller andmay choose to alter it before passing it on to the plant. We show how optimalshields that interfere as little as possible while guaranteeing a desired level ofcontroller performance, can be generated systematically and automatically usingreactive synthesis. First, we abstract the plant by building a stochastic model.Second, we consider the learned controller to be a black box. Third, we mea-surecontroller performanceandshield interferenceby two quantitative run-timemeasures that are formally defined using weighted automata. Then, the problemof constructing a shield that guarantees maximal performance with minimal inter-ference is the problem of finding an optimal strategy in a stochastic2-player game“controller versus shield” played on the abstract state space of the plant with aquantitative objective obtained from combining the performance and interferencemeasures. We illustrate the effectiveness of our approach by automatically con-structing lightweight shields for learned traffic-light controllers in various roadnetworks. The shields we generate avoid liveness bugs, improve controller per-formance in untrained and changing traffic situations, and add features to learnedcontrollers, such as giving priority to emergency vehicles

    Active MR k-space Sampling with Reinforcement Learning

    Full text link
    Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models given a pre-determined acquisition trajectory, ignoring the question of trajectory optimization. In this paper, we focus on learning acquisition trajectories given a fixed image reconstruction model. We formulate the problem as a sequential decision process and propose the use of reinforcement learning to solve it. Experiments on a large scale public MRI dataset of knees show that our proposed models significantly outperform the state-of-the-art in active MRI acquisition, over a large range of acceleration factors.Comment: Presented at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 202
    corecore